- 14 Feb, 2017 1 commit
-
-
Matthew Wilcox authored
It is a relatively common idiom (8 instances) to first look up an IDR entry, and then remove it from the tree if it is found, possibly doing further operations upon the entry afterwards. If we change idr_remove() to return the removed object, all of these users can save themselves a walk of the IDR tree. Signed-off-by:
Matthew Wilcox <mawilcox@microsoft.com>
-
- 10 Nov, 2016 1 commit
-
-
Richard Weinberger authored
Don't pass a size larger than iov_len to kernel_sendmsg(). Otherwise it will cause a NULL pointer deref when kernel_sendmsg() returns with rv < size. DRBD as external module has been around in the kernel 2.4 days already. We used to be compatible to 2.4 and very early 2.6 kernels, we used to use rv = sock_sendmsg(sock, &msg, iov.iov_len); then later changed to rv = kernel_sendmsg(sock, &msg, &iov, 1, size); when we should have used rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len); tcp_sendmsg() used to totally ignore the size parameter. 57be5bda ip: convert tcp_sendmsg() to iov_iter primitives changes that, and exposes our long standing error. Even with this error exposed, to trigger the bug, we would need to have an environment (config or otherwise) causing us to not use sendpage() for larger transfers, a failing connection, and have it fail "just at the right time". Apparently that was unlikely enough for most, so this went unnoticed for years. Still, it is known to trigger at least some of these, and suspected for the others: [0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html [1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html [2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546 [3] https://ubuntuforums.org/showthread.php?t=2336150 [4] http://e2.howsolveproblem.com/i/1175162/ This should go into 4.9, and into all stable branches since and including v4.0, which is the first to contain the exposing change. It is correct for all stable branches older than that as well (which contain the DRBD driver; which is 2.6.33 and up). It requires a small "conflict" resolution for v4.4 and earlier, with v4.5 we dropped the comment block immediately preceding the kernel_sendmsg(). Fixes: b411b363 ("The DRBD driver") Cc: <stable@vger.kernel.org> # 2.6.33.x- Cc: viro@zeniv.linux.org.uk Cc: christoph.lechleitner@iteg.at Cc: wolfgang.glas@iteg.at Reported-by:
Christoph Lechleitner <christoph.lechleitner@iteg.at> Tested-by:
Christoph Lechleitner <christoph.lechleitner@iteg.at> Signed-off-by:
Richard Weinberger <richard@nod.at> [changed oneliner to be "obvious" without context; more verbose message] Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 07 Aug, 2016 1 commit
-
-
Jens Axboe authored
Since commit 63a4cc24 , bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 14 Jun, 2016 5 commits
-
-
Fabian Frederick authored
This contains various cosmetic fixes ranging from simple typos to const-ifying, and using booleans properly. Original commit messages from Fabian's patch set: drbd: debugfs: constify drbd_version_fops drbd: use seq_put instead of seq_print where possible drbd: include linux/uaccess.h instead of asm/uaccess.h drbd: use const char * const for drbd strings drbd: kerneldoc warning fix in w_e_end_data_req() drbd: use unsigned for one bit fields drbd: use bool for peer is_ states drbd: fix typo drbd: use | for bitmask combination drbd: use true/false for bool drbd: fix drbd_bm_init() comments drbd: introduce peer state union drbd: fix maybe_pull_ahead() locking comments drbd: use bool for growing drbd: remove redundant declarations drbd: replace if/BUG by BUG_ON Signed-off-by:
Fabian Frederick <fabf@skynet.be> Signed-off-by:
Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Lars Ellenberg authored
We will support WRITE_SAME, if * all peers support WRITE_SAME (both in kernel and DRBD version), * all peer devices support WRITE_SAME * logical_block_size is identical on all peers. We may at some point introduce a fallback on the receiving side for devices/kernels that do not support WRITE_SAME, by open-coding a submit loop. But not yet. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Lars Ellenberg authored
Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Philipp Reisner authored
If during resync we read only zeroes for a range of sectors assume that these secotors can be discarded on the sync target node. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Lars Ellenberg authored
The intention was to only suspend IO if some normal bitmap operation is supposed to be locked out, not always. If the bulk operation is flaged as BM_LOCKED_CHANGE_ALLOWED, we do not need to suspend IO. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 07 Jun, 2016 2 commits
-
-
Mike Christie authored
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by:
Mike Christie <mchristi@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Separate the op from the rq_flag_bits and have drbd set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by:
Mike Christie <mchristi@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 12 Apr, 2016 1 commit
-
-
Jens Axboe authored
Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
- 27 Jan, 2016 1 commit
-
-
Herbert Xu authored
This patch replaces uses of the long obsolete hash interface with either shash (for non-SG users) or ahash. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- 25 Nov, 2015 8 commits
-
-
Lars Ellenberg authored
Disconnect should wait for pending bitmap IO. But if that bitmap IO is not happening, because it is waiting for pending application IO, and there is no progress, because the fencing policy suspended application IO because of the disconnect, then we deadlock. The bitmap writeout in this case does not care for concurrent application IO, so there is no point waiting for it. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Lars Ellenberg authored
lsblk should be able to pick up stacking device driver relations involving DRBD conveniently. Even though upstream kernel since 2011 says "DON'T USE THIS UNLESS YOU'RE ALREADY USING IT." a new user has been added since (bcache), which sets the precedences for us to use it as well. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Lars Ellenberg authored
When accessing out meta data area on disk, we double check the plausibility of the requested sector offsets, and are very noisy about it if they look suspicious. During initial read of our "superblock", for "external" meta data, this triggered because the range estimate returned by drbd_md_last_sector() was still wrong. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Lars Ellenberg authored
Since kernel 3.3, we can use snprintf-style arguments to create a workqueue. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Philipp Reisner authored
The intention is to reduce CPU utilization. Recent measurements unveiled that the current performance bottleneck is CPU utilization on the receiving node. The asender thread became CPU limited. One of the main points is to eliminate the idr_for_each_entry() loop from the sending acks code path. One exception in that is sending back ping_acks. These stay in the ack-receiver thread. Otherwise the logic becomes too complicated for no added value. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Philipp Reisner authored
This prepares the next patch where the sending on the meta (or control) socket is moved to a dedicated workqueue. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Andreas Gruenbacher authored
Instead of using a rwlock for synchronizing state changes across resources, take the request locks of all resources for global state changes. Use resources_mutex to serialize global state changes. This means that taking the request lock of a resource is now enough to prevent changes of that resource. (Previously, a read lock on the global state lock was needed as well.) Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Andreas Gruenbacher authored
Also change the enum values to all-capital letters. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 13 Aug, 2015 1 commit
-
-
Kent Overstreet authored
As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by:
Dongsu Park <dpark@posteo.net> Signed-off-by:
Ming Lin <ming.l@ssi.samsung.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 02 Jun, 2015 1 commit
-
-
Tejun Heo authored
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->state into wb. * enum bdi_state is renamed to wb_state and the prefix of all enums is changed from BDI_ to WB_. * Explicit zeroing of bdi->state is removed without adding zeoring of wb->state as the whole data structure is zeroed on init anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->state are mechanically replaced with bdi->wb.state introducing no behavior changes. Signed-off-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: drbd-dev@lists.linbit.com Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 25 Mar, 2015 1 commit
-
-
David Rientjes authored
Mempools created for slab caches should use mempool_create_slab_pool(). Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by:
David Rientjes <rientjes@google.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 10 Nov, 2014 2 commits
-
-
Andreas Gruenbacher authored
Avoid generic netlink calls in other parts of the code base. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Andreas Gruenbacher authored
. Update comments . drbd_set_{in,out_of}_sync(): Remove unused parameters . Move common code into adm_del_resource() . Redefine ERR_MINOR_EXISTS -> ERR_MINOR_OR_VOLUME_EXISTS Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 11 Sep, 2014 1 commit
-
-
Andreas Gruenbacher authored
Rename local variable 'ds' to 'disk_state' or 'data_size'. 'dgs' to 'digest_size' Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 10 Jul, 2014 14 commits
-
-
Lars Ellenberg authored
Don't error out with misleading "out of memory" if the cpu-mask has more bits set than there are CPUs. Just truncate to nr_cpu_ids implicitly. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
* Add details about pending meta data operations to in_flight_summary. * Report number of requests waiting for activity log transactions. * timing details of peer_requests to in_flight_summary. * FLUSH details DRBD devides the incoming request stream into "epochs", in which peers are allowed to re-order writes independendly. These epochs are separated by P_BARRIER on the replication link. Such barrier packets, depending on configuration, may cause the receiving side to drain the lower level device request queues and call blkdev_issue_flush(). This is known to be an other major source of latency in DRBD. Track timing details of calls to blkdev_issue_flush(), and add them to in_flight_summary. * data socket stats To be able to diagnose bottlenecks and root causes of "slow" IO on DRBD, it is useful to see network buffer stats along with the timing details of requests, peer requests, and meta data IO. * pending bitmap IO timing details to in_flight_summary. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Add new debugfs hierarchy /sys/kernel/debug/ drbd/ resources/ $resource_name/connections/peer/$volume_number/ $resource_name/volumes/$volume_number/ minors/$minor_number -> ../resources/$resource_name/volumes/$volume_number/ Followup commits will populate this hierarchy with files containing statistics, diagnostic information and some attribute data. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Track start and submit time of bitmap operations, and add pending bitmap IO contexts to a new pending_bitmap_io list. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Background resynchronisation does some "side-stepping", or throttles itself, if it detects application IO activity, and the current resync rate estimate is above the configured "cmin-rate". What was not detected: if there is no application IO, because it blocks on activity log transactions. Introduce a new atomic_t ap_actlog_cnt, tracking such blocked requests, and count non-zero as application IO activity. This counter is exposed at proc_details level 2 and above. Also make sure to release the currently locked resync extent if we side-step due to such voluntary throttling. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Adding requests to per-device fifo lists as soon as possible after allocating them leaves a simple list_first_entry_or_null() to find the oldest request, regardless what it is still waiting for. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Record (in jiffies) how much time a request spends in which stages. Followup commits will use and present this additional timing information so we can better locate and tackle the root causes of latency spikes, or present the backlog for asynchronous replication. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
For diagnostic purposes, track intent, start time and latest submit time of meta data IO. Move separate members from struct drbd_device into the embeded struct drbd_md_io. s/md_io_(page|in_use)/md_io.\1/ Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
drbd_destroy_device means to give up reference counts on the connection(s) reachable via the peer_device(s). It must not do that by iterating via device->resource->connections, resource and connections may have already been disassociated by drbd_free_resource, and we'd leak connection refs. Instead, iterate via device->peer_devices->connection. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Now that we have additional asynchronous kref_get/kref_put via debugfs, make sure we catch access after free. Poison struct drbd_device, drbd_connection and drbd_resource before kfree() with 0xfd, 0xfc, and 0xf2, respectively. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Cosmetic change only. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
If "dirty" blocks are written to during resync, that brings them in-sync. By explicitly requesting write-acks during resync even in protocol != C, we now can actually respect this. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Andreas Gruenbacher authored
Get rid of dump_stack() debug statements. There is no point whatsoever in registering and unregistering a reboot notifier that doesn't do anything. The intention was to switch to an "emergency read-only" mode, so we won't have to resync the full activity log just because we had been Primary before the reboot. Once we have that implemented, we may re-introduce the reboot notifier. Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com>
-