You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "YangSong (Jira)" <ji...@apache.org> on 2019/10/31 10:42:00 UTC

[jira] [Comment Edited] (KUDU-2975) Spread WAL across multiple data directories

    [ https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963856#comment-16963856 ] 

YangSong edited comment on KUDU-2975 at 10/31/19 10:41 AM:
-----------------------------------------------------------

Thank you, let me summarize the implementation:

1. We need to add a new gflag, such as "–fs_wal_dirs", to support spreading WAL across multiple dirs. And we should keep around {{--fs_wal_dir}} for backwards compatibility. User can chose one of them.

2. The first time 'fs_manager' is initialized it needs to generate an instance file per wal directory. If the data directories (fs_data_dirs) not provided, we use write-ahead log directories(fs_wal_dirs) as data directories. If the metadata directory not provided, we use the first wal directories or the first data directories. If one of the WAL directories doesn't exist, report a fatal error. If some of WAL directories have 'instance' file, but some of them have not, report a fatal error. 

3. Add a class WalDirManager, maybe like this:
{code:java}
class WalDirManager {
 public:
  static Status Create(CanonicalizedRootsList wal_fs_roots,
                       std::unique_ptr<WalDirManager>* wal_manager);
  static Status Open(CanonicalizedRootsList wal_fs_roots,
                     std::unique_ptr<WalDirManager>* wal_manager);
  ~WalDirManager();
  void Shutdown();
  Status LoadWalDirFromPB(const std::string& tablet_id, const WalDirPB& pb);
  std::set<std::string> FindTabletsByWALDir(const std::string& wal_dir) const;
  Status FindWalDirByTabletId(const std::string& tablet_id, std::string* wal_dir) const;
  Status MarkWalDirsFailed(const std::string& error_message = "");
  void MarkWalDirFailed(const std::string& dir);
  bool IsWalDirFailed(const std::string& dir) const;
  const std::set<string> GetFailedDataDirs() const;
  std::vector<std::string> GetWalDirs() const;
  string GetWalDirByUuid(string uuid) const;
  Status CreateWalDir(const std::string& tablet_id);
 private:
  WalDirManager(CanonicalizedRootsList canonicalized_wal_roots);  const
  CanonicalizedRootsList canonicalized_wal_fs_roots_;
  typedef std::unordered_map<std::string, std::string> DirByUuidMap;
  DirByUuidMap dir_by_uuid_;
  typedef std::multimap<std::string, std::string> TabletsByDirMap;
  TabletsByDirMap tablets_by_dir_;
  typedef std::set<string> FailedWalDirSet;
  FailedWalDirSet failed_data_dirs_;
};{code}
 * We need to update the "instance" file under per WAL dir when creating a new WalDirManager class. Each wal directory generates its own uuid, and recorde it in the instance file.
 * The directory structure may be like this: 

{panel:title=one of WAL directorys's structure}
 

  ----wal

  --------instance

  --------wals

  ------------tablet1_uuid

  ----------------index.0

  ----------------wal.0

  ------------tablet2_uuid

  ----------------index.0

  ----------------wal.0

 
{panel}
 * When creating metadata for tablet, you need to determine the wal directory for the tablet. Record the identified uuid of dir into the tablet's metadata, by WalDirPB.
 * The way to determine the WAL directory for the tablet is to call the function "WalDirManager::CreateWalDir()". A simple way to do this is to record how many tablets there are in each WAL directory, and select the directory with the lowest number of tablets each time.
 * When deleting tablet, we need to delete the relevant information in "TabletsByDirMap". For tombstoned tablet, we also need to clear the WAL dir from the metadata.

4. After we've passed the initial FsManager checks and start bootstrapping, if tablet's metadata is missing WAL directory information and the state of tablet is not tombstoned, we mark the tablet failed. If metadata is OK, but has rowset and miss WAL(such as "tablet1_uuid" missed, if "wal" missed, KUDU will crash while checking FsManager), we also mark the tablet failed. I did a test with the latest KUDU version, if I removed some tablets's WALs, then restarted the tserver, the tserver could start with error like "Tablet failed to bootstrap: Illegal state:Found rowsets but no log segments could be found.". If the tserver was restarted immediately, tablet would be recovered by raft. If we waited a few minutes, then restarted the tserver, the tablet has been recovered to other tserver, the tablet would be tombstoned. 

5. If a disk IO error is reported while reading or writing to WAL file/directory, this is similar to what we do for data directory failures. We may need to modify this function "FailTabletsInDataDir(string uudi)", change it as "FailTabletsInDir(DirType type, string uuid)" , the "DirType" identifies whether it belongs to the data directory or the WAL directory. 

6. We also need to modify the relevant code about "--fs_wal_dir" in the tool.

Is this an accurate summary? There may be omissions or errors. This approach seems relatively simpler and can solve the problem quickly.


was (Author: yangsong):
Thank you, let me summarize the implementation:

1. We need to add a new gflag, such as "–fs_wal_dirs", to support spreading WAL across multiple dirs. And we should keep around {{--fs_wal_dir}} for backwards compatibility. User can chose one of them.

2. The first time 'fs_manager' is initialized it needs to generate an instance file per wal directory. If the data directories (fs_data_dirs) not provided, we use write-ahead log directories(fs_wal_dirs) as data directories. If the metadata directory not provided, we use the first wal directories or the first data directories. If one of the WAL directories doesn't exist, report a fatal error. If some of WAL directories have 'instance' file, but some of them have not, report a fatal error. 

3. Add a class WalDirManager, maybe like this:
{quote}class WalDirManager {

public:  

  static Status Create(CanonicalizedRootsList wal_fs_roots,   std::unique_ptr<WalDirManager>* wal_manager);      static Status Open(CanonicalizedRootsList wal_fs_roots,   std::unique_ptr<WalDirManager>* wal_manager);     ~WalDirManager();

  void Shutdown();

  Status LoadWalDirFromPB(const std::string& tablet_id, const WalDirPB& pb);

  std::set<std::string> FindTabletsByWALDir(const std::string& wal_dir) const;

  Status FindWalDirByTabletId(const std::string& tablet_id, std::string* wal_dir) const;

  Status MarkWalDirsFailed(const std::string& error_message = "");

  void MarkWalDirFailed(const std::string& dir);

  bool IsWalDirFailed(const std::string& dir) const;

  const std::set<string> GetFailedDataDirs() const;

  std::vector<std::string> GetWalDirs() const;

  string GetWalDirByUuid(string uuid) const;

  Status CreateWalDir(const std::string& tablet_id);

private:

  WalDirManager(CanonicalizedRootsList canonicalized_wal_roots);

  const CanonicalizedRootsList canonicalized_wal_fs_roots_;

  typedef std::unordered_map<std::string, std::string> DirByUuidMap;

  DirByUuidMap dir_by_uuid_;

  typedef std::multimap<std::string, std::string> TabletsByDirMap;

  TabletsByDirMap tablets_by_dir_;

  typedef std::set<string> FailedWalDirSet;

  FailedWalDirSet failed_data_dirs_;

}
{quote}
 
 * We need to update the "instance" file under per WAL dir when creating a new WalDirManager class. Each wal directory generates its own uuid, and recorde it in the instance file.
 * The directory structure may be like this: 

 
{panel:title=one of WAL directorys's structure}
 

  ----wal

  --------instance

  --------wals

  ------------tablet1_uuid

  ----------------index.0

  ----------------wal.0

  ------------tablet2_uuid

  ----------------index.0

  ----------------wal.0

 
{panel}
 
 * When creating metadata for tablet, you need to determine the wal directory for the tablet. Record the identified uuid of dir into the tablet's metadata, by WalDirPB.
 * The way to determine the WAL directory for the tablet is to call the function "WalDirManager::CreateWalDir()". A simple way to do this is to record how many tablets there are in each WAL directory, and select the directory with the lowest number of tablets each time.
 * When deleting tablet, we need to delete the relevant information in "TabletsByDirMap". For tombstoned tablet, we also need to clear the WAL dir from the metadata.

4. After we've passed the initial FsManager checks and start bootstrapping, if tablet's metadata is missing WAL directory information and the state of tablet is not tombstoned, we mark the tablet failed. If metadata is OK, but has rowset and miss WAL(such as "tablet1_uuid" missed, if "wal" missed, KUDU will crash while checking FsManager), we also mark the tablet failed. I did a test with the latest KUDU version, if I removed some tablets's WALs, then restarted the tserver, the tserver could start with error like "Tablet failed to bootstrap: Illegal state:Found rowsets but no log segments could be found.". If the tserver was restarted immediately, tablet would be recovered by raft. If we waited a few minutes, then restarted the tserver, the tablet has been recovered to other tserver, the tablet would be tombstoned. 

5. If a disk IO error is reported while reading or writing to WAL file/directory, this is similar to what we do for data directory failures. We may need to modify this function "FailTabletsInDataDir(string uudi)", change it as "FailTabletsInDir(DirType type, string uuid)" , the "DirType" identifies whether it belongs to the data directory or the WAL directory. 

6. We also need to modify the relevant code about "--fs_wal_dir" in the tool.

Is this an accurate summary? There may be omissions or errors. This approach seems relatively simpler and can solve the problem quickly.

> Spread WAL across multiple data directories
> -------------------------------------------
>
>                 Key: KUDU-2975
>                 URL: https://issues.apache.org/jira/browse/KUDU-2975
>             Project: Kudu
>          Issue Type: New Feature
>          Components: fs, tablet, tserver
>            Reporter: LiFu He
>            Priority: Major
>         Attachments: network.png, tserver-WARNING.png, util.png
>
>
> Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we created a big table and loaded data to it through flink.  We noticed that the util of one SSD which is used to store WAL is 100% but others are free. So, we suggest to spread WAL across multiple data directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)