{"created":"2023-06-19T11:45:23.277022+00:00","id":15694,"links":{},"metadata":{"_buckets":{"deposit":"d3640646-8c3e-42c4-baea-c20d451abc38"},"_deposit":{"created_by":15,"id":"15694","owners":[15],"pid":{"revision_id":0,"type":"depid","value":"15694"},"status":"published"},"_oai":{"id":"oai:mie-u.repo.nii.ac.jp:00015694","sets":["366:640:641:1384"]},"author_link":["52210","52208"],"item_7_biblio_info_6":{"attribute_name":"書誌情報","attribute_value_mlt":[{"bibliographicIssueDates":{"bibliographicIssueDate":"2023-03","bibliographicIssueDateType":"Issued"}}]},"item_7_contributor_61":{"attribute_name":"修士論文指導教員","attribute_value_mlt":[{"contributorNames":[{"contributorName":"大野, 和彦","lang":"ja"}],"nameIdentifiers":[{"nameIdentifier":"52210","nameIdentifierScheme":"WEKO"}]}]},"item_7_description_14":{"attribute_name":"フォーマット","attribute_value_mlt":[{"subitem_description":"application/pdf","subitem_description_type":"Other"}]},"item_7_description_4":{"attribute_name":"抄録","attribute_value_mlt":[{"subitem_description":"近年，GPUを用いて汎用計算を行うGPGPUの利用が活発である．しかし，CUDA などの現在主流のGPU プログラミング開発環境ではGPUアーキテクチャを強く意識してプログラミングする必要がある．これに対し，我々は低レベルコードの記述量を減らした開発環境であるMESICUDAを開発している．GPU は高速・小容量のシェアードメモリを複数搭載しており，MESI-CUDA はこれを利用するコードを自動生成するが，現在の手法ではデータサイズが大きい場合にうまく扱えない．本稿では，この問題を解決するデータ入れ替え機構を提案する．また，同じシェアードメモリを共有するスレッドのグループであるブロックの大きさを適切に指定するのが難しいという問題もあり，このブロックサイズの自動化手法も提案する．さらに，ブロックの集まりであるグリッドの次元が1 次元のみであったのに対し，2 次元まで対応できるように拡張する．\nシェアードメモリ入れ替え手法では大規模データの場合シェアードメモリにそのまま格納できないという問題に対し，シェアードメモリに格納可能な最大データサイズの情報をもとにシェアードメモリに格納できるサイズのデータに分割し，データを入れ替えている．これによりシェアードメモリに格納できないサイズのデータの場合でもシェアードメモリを利用できるようになり，速度を向上させることに成功した．\nブロックサイズ自動決定手法ではブロックサイズの最大が1024 であることを考慮し，データサイズの8 分の1 と1024 を比較し，小さい方をブロックサイズとしている．これにより最適なブロックサイズを必ず選択することは出来ないが，速度低下が著しくないブロックサイズを選択することが可能となり，実行速度への影響を押さえながらブロックサイズの指定を不要にした．\n2 次元グリッドの適用ではMESI-CUDAの記法を変更し，スレッドマッピングに制限を付けることによりブロックの集まりであるグリッドの次元を2 次元まで対応させた．これによりカーネル関数を複数回呼び出す必要性がなくなり，並列度が上がることでカーネル関数呼び出しのオーバヘッドが減ったため，実行速度が向上した．\nこれらの手法を適用したプログラムを評価した結果，従来手法と比べて実行速度が向上した．また，従来手法と比較してMESI-CUDA コードにおける低レベルコードの記述量が減少した．","subitem_description_type":"Abstract"},{"subitem_description":"In recent years, the use of GPGPUs, which use GPUs for general-purpose computation, has been active. However, the current mainstream GPU programming development environments such as CUDA require programming with a strong awareness of the GPU architecture. In contrast, we are developing MESI-CUDA, a development environment that reduces the amount of low-level code to be written. GPUs are equipped with multiple high-speed, small-capacity shared memories, and MESI-CUDA automatically generates code to use these memories. However, current methods do not handle large data sizes well. In this paper, we propose a data swapping mechanism to solve this problem. Another problem is that it is difficult to specify the appropriate size of blocks, which are groups of threads that share the same shared memory, and we propose an automated method for block size. Furthermore, the grid, which is a collection of blocks, has only one dimension. However, we extend it to support two dimensions.\nThe shared memory replacement method addresses the problem of large data that cannot be stored in shared memory as is. This method enables the use of shared memory even when the size of data cannot be stored in shared memory, thereby improving the speed.\nThe automatic block size determination method takes into account that the maximum block size is 1024, and compares one-eighth of the data size with 1024, and uses the smaller of the two as the block size. Although the optimal block size cannot always be selected, it is possible to select a block size that does not significantly reduce the speed, thus eliminating the need to specify the block size while minimizing the impact on execution speed.\nFor the 2D grid application, the MESI-CUDA notation has been changed to support up to two dimensions of the grid, which is a collection of blocks, by restricting the thread mapping. This eliminated the need for multiple kernel function calls, and the increased parallelism reduced the overhead of kernel function calls, resulting in increased execution speed.\nEvaluation of programs using these methods showed that the execution speed was improved compared to the conventional method. In addition, the amount of low-level code in the MESI-CUDA code was reduced compared to the conventional method.","subitem_description_type":"Abstract"}]},"item_7_description_5":{"attribute_name":"内容記述","attribute_value_mlt":[{"subitem_description":"三重大学大学院工学研究科 情報工学専攻 コンピュータアーキテクチャ研究室","subitem_description_type":"Other"},{"subitem_description":"36p","subitem_description_type":"Other"}]},"item_7_publisher_30":{"attribute_name":"出版者","attribute_value_mlt":[{"subitem_publisher":"三重大学"}]},"item_7_text_31":{"attribute_name":"出版者（ヨミ）","attribute_value_mlt":[{"subitem_text_value":"ミエダイガク"}]},"item_7_text_65":{"attribute_name":"資源タイプ（三重大）","attribute_value_mlt":[{"subitem_text_value":"Master's Thesis / 修士論文"}]},"item_7_version_type_15":{"attribute_name":"著者版フラグ","attribute_value_mlt":[{"subitem_version_resource":"http://purl.org/coar/version/c_970fb48d4fbd8a85","subitem_version_type":"VoR"}]},"item_creator":{"attribute_name":"著者","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"齊藤, 紡生","creatorNameLang":"ja"},{"creatorName":"サイトウ, ツムギ","creatorNameLang":"ja-Kana"}],"nameIdentifiers":[{"nameIdentifier":"52208","nameIdentifierScheme":"WEKO"}]}]},"item_files":{"attribute_name":"ファイル情報","attribute_type":"file","attribute_value_mlt":[{"accessrole":"open_date","date":[{"dateType":"Available","dateValue":"2023-05-22"}],"displaytype":"detail","filename":"2022ME0190.pdf","filesize":[{"value":"450.3 kB"}],"format":"application/pdf","licensetype":"license_note","mimetype":"application/pdf","url":{"label":"2022ME0190","url":"https://mie-u.repo.nii.ac.jp/record/15694/files/2022ME0190.pdf"},"version_id":"fe8d9f4c-6984-4ff9-947c-c65813efad2d"}]},"item_language":{"attribute_name":"言語","attribute_value_mlt":[{"subitem_language":"jpn"}]},"item_resource_type":{"attribute_name":"資源タイプ","attribute_value_mlt":[{"resourcetype":"thesis","resourceuri":"http://purl.org/coar/resource_type/c_46ec"}]},"item_title":"GPUのシェアードメモリ自動利用機構における大規模データへの対応","item_titles":{"attribute_name":"タイトル","attribute_value_mlt":[{"subitem_title":"GPUのシェアードメモリ自動利用機構における大規模データへの対応","subitem_title_language":"ja"}]},"item_type_id":"7","owner":"15","path":["1384"],"pubdate":{"attribute_name":"PubDate","attribute_value":"2023-05-22"},"publish_date":"2023-05-22","publish_status":"0","recid":"15694","relation_version_is_last":true,"title":["GPUのシェアードメモリ自動利用機構における大規模データへの対応"],"weko_creator_id":"15","weko_shared_id":-1},"updated":"2023-12-04T04:38:50.048765+00:00"}